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Generalized Error Exponents 
For Small Sample Universal Hypothesis Testing 

Dayu Huang and Sean Meyn 
Abstract 

The small sample universal hypothesis testing problem, where the number of samples n is smaller than the 
number of possible outcomes m, is investigated in this paper The goal of this work is to find an appropriate 
criterion to analyze statistical tests in this setting. A suitable model for analysis is the high-dimensional model in 
which both n and m increase to infinity, and n = o{m). A new performance criterion based on large deviations 
analysis is proposed and it generalizes the classical error exponent applicable for large sample problems (in which 
m — 0{n)). This generalized error exponent criterion provides insights that are not available from asymptotic 
consistency or central limit theorem analysis. The results are: 

(i) The best achievable probability of error decays as — exp{ — (n^/m) J(l + o(l))} for some J > 0. 

(ii) A class of tests based on separable statistics, including the coincidence-based test, attains the optimal 
generalized error exponents. 

(iii) Pearson's chi-square test has a zero generalized error exponent and thus its probability of error is asymp- 
totically larger than the optimal test. 

Index Terms 

Hypothesis testing, large deviations, small sample, separable statistic, error exponent, large alphabet. 

I. Introduction 

As an example of the application of the results, consider the following hypothesis testing problem. An i.i.d. 
sequence Y'l = {Yi, . . . ,Yn} with Yi G [0, 1] is observed. There are two hypotheses: Under the null hypothesis 
HO, the probability measure induced by Yi is denoted by P. Under the alternative hypothesis HI, it is only known 
that the probability measure Q induced by Yi satisfies Q € Q. For simplicity of exposition, we assume in this 
section that P is absolutely continuous with respect to the Lebesgue measure on [0, 1], and the density is positive 
almost everywhere; Q is absolutely continuous with respect to P. 

The goal is to design a test : [0, 1]" — {0, 1} with small probabilities of false alarm and missed detection: 

Pf := Pp{MYi) = 1}, A/ := sup Pq{MYi) = 0}. 

QeQ 

We consider a universal hypothesis testing problem, also called goodness of fit. It has the following form of Q: 

Q = {Q:d{Q,P)>e} 

where d is a distance function that could change with n, and e > 0. As discussed in 131, if the distance function 
is the total variation distance or any distance function dominating the total variation distance, then there is no test 
that is asymptotically consistent: i.e. Pp — )• and Pm — )• as n — )• oo. On the other hand, there is a consistent 
test if the distance function is the total variation distance defined on a finite partition of [0, 1]: Let 

A = {^1 , • • • , -4m} 

be a partition of [0, 1]. The total variation distance defined on this partition is given by 

dA{Q,P)= sup {\Q{A)-P{A)\}. (1) 
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As the number of observations n increases, it is desirable for a test to not only have a decreasing probability of 
error, but also be effective against an increasingly larger alternative set Q. Therefore, we consider a sequence of 
distance functions defined with increasingly finer partitions. We restrict ourselves to partitions in which the cells 
of the partition have equal probabilities under P: 

P{Aj) = 1/m for 1 < j < m. (2) 

One reason to consider uniform cells, as argued in f4l, is that the total- variation distance based on this partition 
gives the best possible distinguishability with respect the Kolmogorov-Smirnov distance: Consider the maximum 
Kolmogorov-Smirnov distance between the null distribution and any alternative distribution that has zero partition- 
based total variation distance to the null distribution. Then among any partitions with the same number of cells, 
the maximum Kolmogorov-Smirnov distance is minimized by the partition with uniform cells. 

The dependence between n and m plays a significant role on test analysis and synthesis: the small sample case 
in which n/m — )• has a different nature than the large sample case in which n/m — )• oo. In the large sample 
case, the number of samples per cell increases to infinity, and thus eventually the underlying probability that Yi 
falls in each cell of A can be estimated. This does not hold for the small sample case, in which m increases faster 
than n. The goal of this paper is to find an appropriate analysis criterion for the small sample problem. 

A. Related work 

In this section, we review related results with emphasis on the type of analysis used and the asymptotic settings 
considered. Many of the results reviewed apply to cases more general than ([2]). 

Examples of partitioned-based tests for small sample problems include Pearson's chi-square test. Generalized 
Likelihood Ratio Test (GLRT) and the coincidence-based test proposed in Q. 

Existing results differ in the asymptotic setting considered, which can be roughly classified into three cases: 1) 
m is fixed; 2) m is increasing and m = 0{n); 3) n = o(m) and m = o{n?). There is no need to consider the case 
n = 0{^/m) because the converse result (lower-bounds on probability of error) established in [5J indicates that no 
asymptotically consistent test exists if n = 0{y/rn). 

There are three predominant types of analysis: 

1. Asymptotic consistency / sample complexity analysis: This type of analysis characterizes how fast m can 
increase with n, while still ensuring that lim sup„_^oc Pp < S, lim sup^_j.oQ Pm < S for any small 5 > 0. 

Finer results on Pp and Pm are obtained in Central Limit Theorem (CLT) and large deviations analysis. 

2. CLT analysis: CLTs are applied to obtain asymptotic approximations of the distributions of the test statistic 
under both hypotheses. It is usually assumed that e — as a function of n, i.e., the set of alternative distributions 
becomes closer to the null distribution as n increases. This ensures that the decision boundary of the test is close 
to both the null distribution and the alternative distributions, so that the probabilities of false alarm and missed 
detection can be analyzed using the CLT. Under this choice of e, Pp and Pm usually converge to nonzero values. 
The results characterize how the limits of Pp and Pm differ for different tests. 

3. large deviations analysis: The normalized limits (or asymptotic expansions) of log{Pp{(j))) and log(Pj\f (i;^)) 
are studied. The distance e > is held to be a constant in large deviations analysis. The proper normalization of 
log(PF'((/))) and log(PM(0)) must first be identified, and then the normalized limits are calculated. 

Consider the case where m is fixed. 

a) Pearson's chi-square statistic and GLRT statistic are asymptotically distributed as a chi-square distribution whose 
degree of freedom is m — 1. These results and their extensions can be found in ||6l |7l [H |9l [TOl fTTl . 

b) The performance of Pearson's chi-square test and GLRT is analyzed in [12J using the large deviations analysis. 
The following error exponent criterion is used to evaluate a test 0: 

Ip[(t)) := -limsup-log(PF(0„)), 

n^oo n 

lM{(t>) ■= - limsup - log(PA/(0„)). 

n—^oo ^ 

The GLRT is shown to have optimal error exponents while Pearson's chi-square test does not. Our use of the 
term error exponent follows |jT3l . 
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Next consider the case m = 0{n). 

a) Pearson's chi-square test and GLRT are both asymptotically consistent (For example, see |[T4ll '). 

b) Pearson's chi-square statistic and the GLRT statistic both have asymptotically normal distributions. The first 
work in this line is [.15.| . Extensions and applications of this result can be found in |[T6l [TTl [TSl [T9l |20l |2T1|. 

c) A lower-bound on the best achievable probability of error in CLT analysis is given in |[T4ll : Under the condition 
< lim inf„_s>oo < lini sup„_j.oo < oo, Pearson's chi-square test is asymptotically optimal. That is, for any 
test whose limit of Pp is no larger than that of Pearson's chi-square test, the limit of its Pm is asymptotically 
no smaller than that of Pearson's chi-square test. This result applies to the range of m satisfying m = o{v?). 

d) An achievability result (A lower-bound on the error exponent) and a complementing converse result (An upper- 
bound on the error exponent) in the large deviations analysis have been obtained in fSl: There exists a test 
for which Pp and Pm both decay exponentially fast with respect to n, i.e., Ip and Im defined in ([3]) are 
both nonzero, if and only if m = 0{n). Other large deviations and moderate-deviations analyses of GLRT and 
Pearson's chi-square test can be found in ll22l l23l l24l l25l l26l l27l 

Finally consider the small sample case where n = o{m) and m = o{n?). 

a) Pearson's chi-square test is known to be asymptotically consistent [14|. Two others tests shown to be asymp- 
totically consistent is the test based on counting pairwise-collisions [28 1 and the coincidence-based test |j5J. An 
approach to extend tests designed for uniform cells (|2]) to non-uniform cells has been proposed in |[29l . 

b) Results on the asymptotic distribution of Pearson's chi-square statistic and the GLRT statistic have been obtained 

in mEii. 

To the best of our knowledge, the proper normalization for the large deviations analysis has not been identified 
before in the small sample caseP] We note that the classical error exponent analysis is not suitable. 



B. Our contributions 

The new large deviations framework proposed here is motivated by and analogous to the classical error exponent 
([3]) in the large sample case. While the classical error exponent is defined with the normalization n, our main results 
imply that for the small sample problem the following generalized error exponent is best for asymptotic analysis, 
defined with respect to the normalization r(m, n) = n? /m: 

Jp{4>) := - lim sup —^'^ — r log(PF((/)„)), 
n^oo r(m,n) 

Jm{<P) ■■= -limsup— -log{PM{4>n))- 

n-^oo r{m,n) 

The generalized error exponents give the following approximation to the probabilities of false alarm and missed 
detection: 

The generalized error exponent provides new insights that are not available from asymptotic consistency, or CLT 
analysis. More precisely, the following results are established: 

1. The best achievable probability of error P^. = m.aK.{Pp , Pm} , decays as — log(Pe) = r{n,m)J{l + o(l)), 
where r(n, m) = -n? /m. This is applicable not only for the case where the set of alternative distributions is defined 
by the total variation distance in ([T]), but also for a broad collection of distance functions. 

2. A class of tests based on the separable statistics, including the coincidence-based test (j)* , is shown to 
achieve the optimal pair of generalized error exponents Jp and Jm'- 

JMm = max{ Jm(</>) : Jf{^) > Jf(</'*)}. 

The exact formulae for these generalized error exponents are obtained. 

3. The performance of Pearson's chi-square test is asymptotically worse than the optimal test. 

'Combining the upper-bounds on probability of error given in (5] 1291 witli the Chemoff inequahty gives a loose upper-bound on the 
asymptotic probability error and does not yield the proper normalization. 
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C. Organization of the paper 

The paper is organized as follows: The universal hypothesis testing problems and tests are presented in Section [III 
The main achievability and converse results on generalized error exponents are described in Section III Extensions 
of the coincidence-based test are given in Section IV Performance characterization of Pearson's chi-square test 
is given in Section |V[ In Section VI it is shown that the generalized error exponent criterion is also applicable 
when the set of alternative distributions is defined using many other distance functions. The paper is concluded in 
Section Iwl 



II. Models and Preliminaries 

Here we introduce a more general model based on a sequence of universal hypothesis testing problems, each with 
a finite number of outcomes (a finite alphabet). Consider an i.i.d. sequence of observations Zi := {Zi, . . . , Zn} 
where Zi G [m] := {1, 2, . . . , m}. Let Vm denote the collection of probability mass functions (p.m.f.s) on [m]. We 
have two hypotheses: Under the null hypothesis HO, the p.m.f of is given by p, the uniform distribution on [m]: 

Pj = 1 /m for j € [m] . (6) 

Under the alternative hypothesis HI, the p.m.f. of Zi belongs to a set Q„ given by 

Qn:={qeVm-d{q,p)>e} (7) 

where d is taken to be the total variation distance cItv defined for any pair of p.m.f.s on [m]: 

dTv{q,p)= sup MB)-p{B)\}. 

BC[m] 

A test (/) = {(/i>n}n>i is given by a sequence of binary- valued functions 0„ : [m]" — )• {0, 1}. The test decides in 
favor of HO if (j)n{Zi) = 0. The test is required to be powerful against the set Qn of alternative p.m.f.s, and thus 
its performance is evaluated using the probabilities of false alarm PF{(t>n) and worst-case probability of missed 
detection PM{4>n)' 

PpiM ■■= PpiMz'l) = 1}, 

PM,,{(l>n) ■■= Pj0n(^?) = 0}, 

Pm{K) ■■= sup PqiMZl) = 0}. 

gGQ„ 



An important class of tests is based on the separable statistics (see BOll ). A separable statistic is a test statistic 
of the form 

m 

where 

n 

r]:=-Y.I{Z,=j} (8) 

i=l 

is the empirical distribution. General theorems on asymptotic distributions and asymptotic moments of separable 
statistics are available in ll30ll . Large deviations analysis for the case m = 0{n) is given in ll25l l26l . We are not 
aware of previous general large deviations results for the small sample case where n = o{m). 

In this paper, we examine two tests based on separable statistics: Pearson's chi-square test |[32l and the coincidence- 
based test introduced in [51- 

After normalization, the test statistic of Pearson's chi-square test is given by 



m ■'-^ npj 



The test is given by = I{S^ > r„}. 
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The test statistic of the coincidence-based test is given by, 

m 

Sl = -Y,^{nT^ = ^}- (10) 

i=i 

This test statistic S"* counts the number of symbols in [m] that appear in the sequence exactly once. The coincidence- 
based test is given by 0* (Zi) = I{5'* > Ep[S'*] + r„}. The coincidence -based test is applicable only when the 
null distribution is uniform. 

An important difference between 5* and is that fj is bounded in S"*, while this is not true in S^. In Section [v] 
we show that this difference has a significant impact on tests' performance, 



Applications to continuously-valued observations 

Tests designed for finite-valued observations can be applied to solve a universal hypothesis testing problem with 
continuously- valued observations by first partitioning observation space. Consider a measurable space {Y,B), and 
let Yi = {Yi, . . . ,Yn] be an i.i.d. sequence of observations with Yi G Y. We have two hypotheses: 

m-.Yi^P, Hl-.Yi^QeQ (11) 

To apply a test designed for the finite- valued observations, we start with a partition of Y: 

A = {Ai , . . . , j4m} 

where ^i<j<mAj = Y. The observation Yi is mapped to a finite-valued observation via T : Y — )■ [m]: Zi := 
T{Y) = j if Yi G Aj. Then a test defined for finite- valued observations can be applied towards {Zi}. Assume that 
the partition is chosen so that the marginal of Zi is uniform, 

P{A,) = -. (12) 
m 

Then tests designed for a uniform null distribution are applicable, such as the coincidence-based test. 

This partition-based approach gives tests that are optimal for the model introduced in Section |l] More precisely, 
suppose that the set of alternative distributions is defined as 

Q = {Q:dA{Q,P)>e} 

where dA is defined in ([T]). Then in terms of the probability of false alarm and worst-case probability of missed 
detection, without loss of optimality we can restrict our attention to tests whose test statistics take constant value 
on each cell Aj of the partition. This is exactly the collection of partition-based tests we have described. 

The model introduced in Section |l] assumes that the alternative distribution Q is absolutely continuously with 
respect to P. The partition-based tests are still applicable when Q is not absolutely continuous with respect to P, 
provided that the tests for finite-valued observations are designed for a more general model where we allow p not 
to have full support: Instead of Q, let the null distribution p be 

Pj = l/k for I < j < k,pj = for k < j < m. 

The generalized error exponent analysis still applies except the normalization should be n'^/k instead of v? /m. 

III. Generalized Error Exponents 

In this section, we describe the main results on the proper normalization for large deviations analysis for the 
small sample universal hypothesis testing problem. The following assumption is imposed throughout: 

Assumption 1. n = o(m) and m = o{-n?). 

To show that the proper normalization to be used in the definition of generalized error exponent is v? /m, we 
need to establish: 

1) There is a test for which both generalized error exponents are non-zero, and therefore this normalization is 
not too large. 
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2) For any test, at least one of the generalized error exponents is finite, and therefore this normalization is not 
too small. 

These are established in Theorem T] and Theorem [2] These two theorems characterize the achievable region of 
( Ji?, Jm)- This is depicted in Fig. 1] The boundary of the achievable region is given by the following formulae: 

For r G [0,K(e) - 1], 

JKr) :=sup{0T-i(e2^-(l + 20))}, 

(13) 

JUr) := snv{d{K{e) - 1 - r) - ^(e-^^ - (1 - 2e))K{e)}, 



where k : — M+ is the function, 



r ^ ) l + 4e^ e < 0.5, .. 
^^^) = ^ l + e/{l-e), e>0.5. ^'^^ 



Theorem 1 (Achievability). The coincidence-based test (j)* achieves the generalized error exponents given in (13 1, 
i.e., for any t G [0,K{e) — 1], if the sequence of thresholds {t„} is chosen so that, 

T = lim mTnjr? ^ (15) 

n— ^-oo 

then the coincidence-based test has the generalized error exponents: 

= JUr), = Jhir). (16) 

Theorem 2 (Converse). Consider any r G [0, ^(e) — 1]. For any test (j) satisfying 

Jf{(p) > JHr), 

the following upper-bound on the generalized error exponent of missed detection holds: 

Jm{(p) < JUr). 



0.1 
0.08 
0.06 
0.04 
0.02 


0.02 0.04 0.06 0.08 0.1 0.12 0.14 

Fig. 1. Achivable region when e — 0.35 and e — 0.45 given by the lower-bound in Theorem [T] and upper-bound in Theorem [2] The lower 
and upper bound meet over the entire region. 

We now the approximation in Q given by the generalized error exponent analysis to the actual empirical 
performance of the coincidence-based test cjf . The results are shown in Fig. |2] for e = 0.35 and Fig. [3] for e = 0.45. 
We choose the threshold r based on ( [76] ) so that Jp and Jm are the same. The generalized error exponents are 
estimates of the slope of log(Pp) and log(PA,/) with respect to r{n,m). It can be observed that the slope from the 
theoretical approximation by generalized error exponents approximately matches the slope of the simulated value. 
The remaining difference between the theoretical and the empirical slope in Fig. [3] is mainly due to two reasons: 
First, the threshold chosen is based on the first order approximation. It can be observed from the figure that the 
slope for Pm is slightly smaller than the predicted slope while the one for Pp is larger. A slightly larger threshold 
might yield slopes that are closer to the predicted. Second, the generalized error exponent is only the first term in 
the asymptotic expansion of log(PF) and log(PA/). Higher order terms might capture the remaining difference. 
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Fig. 2. Performance of (p* with e = 0.35. 
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Fig. 3. Performance of (p* with e = 0.45. 
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A. Rate function and worst-case distributions 

Similar to the large deviations for the large sample case, we can define a rate function for the small sample case. 
Consider the coincidence-based test (jf . Consider the following restricted set of alternative distributions: 

= e : max < -f/m} , ( 17) 

where 7 is a large positive constant satisfying 7 > max{2/(l — e),4e}. This restricted set of distributions has 
bounded likelihood ratios with respect to the uniform distribution p. The rate function for this test is associated 
with a sequence of distributions q = {q^^\ q^'^\ q^^\ . . .} with q^"^ G as follows: 

2 

'/q(</'*,T) = -limsup^log(Pg(„){5; < £^[5*] + — r}). 
We show that J is a function of the following quantity: 

Kiq) := liminf V ^ . (18) 

j 



Theorem 3 

'q 



Jq{^*,T) = sup{0(-l - r) - i(e-2^ - 1)^(9)}. (19) 
6I>0 



Its proof is given in Appendix [B] 

The rate function can be applied to identify the sequence of worst-case alternative distributions, for which the 
probability of missed detection is asymptotically the largest. Note that Jq{(f)* ,t) is monotonically increasing in 
n{q). Therefore, the smaller the quantity K(g), the larger the probability of missed detection associated with q. 
The sequence of distributions achieving the minimum K{q) is given in the following lemma: 

Lemma 1. When p is the uniform distribution, we have 

ni 2 
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The infimum is achieved by the following bi-uniform distribution: 
1. When e < 0.5, 



l/m + e/lm/2\, j<lm/2\, 
^ l/m-e/\m/2], j > [m/2\. ^ ' 



1. When e > 0.5, 

' l/Lm(l-e)J, j< [m{l-e)\, 



0, 3>Vm{l-e)\. ^^^^ 

Thus, the worst case distributions are identified as bi-uniform distributions whose p.m.f.s only take two possible 
values. 

Proof of Lemma [7]- The main task is to show that any optimizer q* is a bi-uniform distribution. The formulae 
( pT) and ( |22l ) follow from solving the optimization in ( |20| ) restricted to bi-uniform distributions. 

Let Jj^ = {j : q* > Pj}, J- = {j : < Pj}- The following quadratic programming problem has a unique 
optimal solution x* = (f: 

Xj = q* for j ej-, 
Xj > Pj for j e J+. 

By Jensen's inequality, x* must satisfy = x*, for all G Jj^. Thus, q* also satisfies q* = q*-, for all € J+. 
The same conclusion holds for j ^ J^. Consequently, q* must be a bi-uniform distribution. ■ 



B. Sketch of the proofs for Theorem [7] and Theorem |2] 

The large deviations characterization of Pp for the coincidence-based test follows from the following asymptotic 
approximation of the logarithmic moment generating function of its test statistic: 

2 3 

log(E^[exp{0(n - SI)]]) = \'L.{^mY,p]){e-'' - 1) + O(^) + 0(1). 

i=i 

A characterization of Pm is obtained in similar way except we need to work with the set of alternative distributions. 
We show that the probability of missed detection is dominated by that associated with the worst-case distributions 
given in Lemma [T] The details are given in Appendix [B] 

The main idea to prove the converse result is the following: A sequence of events {Bn^T.s} is constructed so 
that (i) the probability of these events can be lower- bounded based on the condition on Pp; (ii) the probability of 
missed detection conditioned on these events is lower-bounded. The key to the proof is the following inequality: 

PM{<l)n) > sup Pg({(/.„ = 0} n Bn,r,s) > SUp ^({(/'n = 0} H Bn,r,s)Ppi{(pn = 0} H Bn,r,8)- 

A lower-bound on the second term follows from the construction of the events and the assumption on the probability 
of false alarm. To lower-bound the first term, we construct a collection of distributions over which the largest 
likelihood ratio is always lower-bounded on the event Bn^r,s- These distributions are obtained by taking the worst- 
case distribution q* given in pT] ) and permuting the symbols in [m]. Let Um denote the collection of all subsets 
of [m] whose cardinality is \m/2\. For each set U G Um, define the distribution qu as 

_ r l/m + e/[m/2j, j (^U; 
^"'■^ 1 l/m-e/rm/2], j(^[m]\U. ^ ' 



Then a lower-bound is established for 



The details are given in Appendix ID) 



sup ^({</.„ = o}ns„,,,5). 
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IV. Extensions of the Coincidence-Based Test 



This section collects together extensions of Section III in terms of tests and models. We first propose a collection 
of tests that extend the coincidence-based test, and provide the freedom for fine-tuning the performance for finite 
samples. We then propose an extension of the coincidence-based test for non-uniform p. 



A. Extensions considering symbols appearing more than once 

The coincidence-based test uses only the number of symbols that appear in the sequence exactly once. We now 
add terms to the test statistic that also depend on the number of symbols appearing more than once to create a 
broader collection of tests. Conditions will be established under which these tests have optimal generalized error 
exponents. Consider the class of test statistics of the following form: 

I 

Sl^ = Sl + Y,vil{nT'^ = 1}. (24) 

1=2 

The test is given by 

,^*+(Zi) = i{5:+-Ep[5:+]>r„,}. 

Theorem 4. //' Z < oo, U2 = 0, and > for all 3 < I < I, then the test achieves the optimal generalized 
error exponents given in ( |13| ). 

Its proof is given in Appendix [C] 

The additional terms for / > 3 in the separable statistic give us ways to fine-tune the test for a better finite-sample 
performance. One interesting question is to obtain finer asymptotic approximations of log(Pp) and log(PA/) that 
provide guidance on how to select the weights 

For the case with V2 7^ 0, we have the following conjecture: 

Conjecture 1. If satisfies I < 00, V2 > —2, and vi > for all 3 < I < I, then the test is optimal in terms of 
the generalized error exponent. 



B. Extensions to non-uniform p 

The coincidence-based test can be extended to the case where p is not necessarily uniform but the likelihood 
ratio between p and the uniform distribution remains bounded. 

Assumption 2. There exists a constant r] > such that maxj mpj < rj holds for all n. 
The following separable statistic is considered, 



S 



w 



with 



The weighted coincidence-based test is given by 



—npj, nF" 
1, nF^ 
0, others 



(25) 



HS^ > r„}. 

The choice of coefficients given in (25 1 ensures E,y[5'Jf'] approximates the £2 -distance between u and p: 



Lemma 2. For v € V^^, the expectation of is given by: 

,2 



n 



i=i 

The proposed test has nonzero generaUzed error exponents: 
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Theorem 5. Suppose As sumption^ and As sumption^ hold. For r G (0,2e-^) where r is defined in (15 1, the test 
(j)^ has nonzero generalized error exponents: 

JFic/^"^) > 0, Jm(</>*)>0. 

Its proof is given in Appendix |C] 

V. Pearson's Chi-Square Test 

In this section, we investigate the performance of Pearson's chi-square test given in Q. We find that this test 
has a zero generalized error exponent, and therefore its probability of error is asymptotically larger than that of the 
coincidence -based test. 

Pearson's chi-square test is asymptotically consistent in the small sample case: 

Proposition 1 (Asymptotic consistency). Under Assumption [7] there exists a sequence of thresholds {r^}, with 
which the Pearson's chi-square test is asymptotically consistent: 

lim = 0, lim PMi€) = 0. 

n— >oo n— >oo 

We give a proof that highlights the relationship between Pearson's chi-square test and the coincidence-based test. 
Proof of Proposition ir Let r„ = n + ^^(^(e) — 1). Applying approximations of moments of separable 
statistic given in Lemma [6[and Lemma [8} we obtain 

E,[5r]=n+0(^), 

„2 - (26) 
Var,[5„n = 2^(mJ]p|)(l + o(l)). 

i=i 

Applying Chebyshev's inequality gives lim„^oo Ppifpn) — 0- 

We bound PjuC'/'n) by coupling Pearson's chi-square statistic S*,^ with the coincidence-based test statistic S*: 

m m 2 " 2 2 

Si = Y.^nT- - np,f = Y^{nT-f - > 2 J] l{nT^ > 2}nr] + J] I{nT] = 1} - = 2n + SI - 

j=i j=i j=i j=i 

where the inequality follows from (nF")^ > 2(nF") when nF" > 1. Consequently, 

{S^ < Tn} C {S*^ <Tn-2n+ -}. (27) 



m 



The asymptotic approximation on the expectation of S* obtained from Lemma [6] gives 



2 2 

77 77 77 

r^-2n+- = Ep[S:] + - 1) + 0( — ). 

m m 



It follows from Theorem [T] that the coincidence-based test is asymptotically consistent. Thus 

71 

lim sup Pq{Sl <Tn-2n+—} = 0. 

n-^co q^Q^ m 

Applying ( [27] ), we obtain 

lim sup Pq{S^ < Tn} = 0. 

■ 

However, the probability of false alarm of Pearson's chi-square test is asymptotically larger than that the 
coincidence-based test: We show that its generalized error exponent of false alarm is zero: 

Theorem 6. Suppose Assumption^hold. Assume in addition that m = o(r? j log(77)^). If the sequence of thresholds 
is chosen so that 

lim Pm^^I) = 0, (28) 
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then the generalized error exponent of false alarm is zero, i.e., 

Jpif) = 0. (29) 
We conjecture that the conclusion holds without the assumption m = o{n? / \og{nf') . 

We now compare Pearson's chi-square test and the coincidence-based test. Note that Pearson's chi-square test 
statistic can be written as 

2 m m oo m 

= - ^ + E ^i^r," = 1} + Y^ ^nV] = 2} + J2J2 ^'^i^^^l = (30) 

j=l j=l 1=3 3 = 1 

The main difference between these two tests are how the coefficients of IjnF" = /} for / > 2 are chosen: Remove 
all the terms corresponding to / > 3 and consider the following separable statistic: 

2 m m 

C = -- + E ^{^r," = 1} + E 4i{^r^" = 2}. (31) 

i=i i=i 
Then we have the following relationship between these three test statistics: 

J]P := {Si < fn} C n* := {S*„ < Tn} C := {S^" < f„} (32) 

where the thresholds t„ and f„ satisfy f„ = t„ + 2n — This is depicted in Fig. ^ Note that the region which 
Pearson's chi-squaie test decides in favor of HI is larger than the coincidence-based test, and the probability that 
the empirical distribution fall into this region is asymptotically larger than exp{— an^/m} for any a > 0. This is 
made precise in the proof of Theorem |6] On the other hand, we can show that the test associated with <p^^ has 
Jm = by considering a sequence of alternative distributions whose likelihood ratios with respect to p increase to 
infinity. In sum, we have 

1) Jir((/>P)=O,JA/(0P)>O; 

2) Jir(r) >0,JAf(r) >0; 

3) Ji.(<AP°)>0,JAf(O = 0. 




Fig. 4. Decision regions in the space of p.m.f. for Pearson's chi-square test, the coincidence-based test and the test given in ( |31[ l. 

Proof of Theorem^ The requirement Pm{4^^) — >• imposes an upper- bound on the threshold r„ for (fF: 
Lemma 3. In order for (|28|) to hold, for large enough n, we must have 



Pi , , „ n 



r„ <f„ := Ep[Sl] + —K{e) + 2 



m \/m 
Consider the event that the first symbol appears many times: 



In the event An, the first term /i(nr") in the summation in the definition of given in (|9]l is approximately 
2^K(e). This drives the value of above the threshold r„. Thus the probability of false alarm conditioned on 
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this event converges to one, as summarized in Lemma [4] On the other hand, the probabiUty of An does not decay 
exponentially fast with respect to v? jm, as summarized in Lemma [s] 

Lemma 4. 

^v{Sl>rn\An} = \-o{X). 

Lemma 5. 

Combining Lemma |3] Lemma |4] and Lemma [5] together, we conclude 

'-^^ ^ liminf ^ ^—''^ rcP 
The proofs of these three lemmas are given in Appendix IE] 



lim ^ log(Pp{^„}) = 0. 



Jf(<A^) < -liminf ^log(Pp{5P > r„|^„}Pp{^„}) = 0. 



VL Alternative Distributions Based on /-Divergence 

The set of alternative distributions studied in previous sections is defined using the total variation distance. The 
generalized error exponent analysis with the same normalization r{n,m) = n^/m also applies to other distance 
functions, as we show in Proposition |2] and Proposition [3] In this section, the set of alternative distributions Qn 
defined in (|7]) is based on a general distance function d rather than d = djy. Examples include the KL divergence 

dKL(.q,p) = ^ Qj log{qj/pj), 
j 

and its generalization known as /-divergence, 

df{q,p) = ^Pjf{qj/Pj), (33) 
j 

where / is a convex function with /(I) = 0. 

Conditions under which the generalized error exponent analysis applies are contained in the following: 

Proposition 2. Suppose the distance function d satisfies 
1) d{q,p) > adTviq^p) fa'' some a > 0. 

liminf inf{y^ ^ : d(q,p) >e,q£ Vm} > 0. 
, Pj 

3 

Then in? /m is the appropriate normalization for the large deviations analysis for small e > 0: There exists a test 
(j) such that 

Jf{4>) >0, Ja/(0) >0. 
There is a constant J satisfying < J < oo such that for any test (j), we have 

min{ JiT (0), Jm(0)} < J- 

For the set of alternative distributions defined in Q with the /-divergence d = dj, the generalized error exponent 
can be applied subject to conditions on /: 

Proposition 3. Suppose f satisfies the following conditions: 

1 ) For some < a; < 1, 

l(f{l-x) + f{l + x))>f{l). 

2) There is a constant a > such that for all x, 

f{x)<a{x-lf. 

Then /m is the appropriate normalization for the large deviations analysis for small e > 0: There exists a test 
(j) such that 

Jf{4>) >0,Jm((^) >o. 
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There is a constant J satisfying < J < oo such that for any test (j), we have 

min{Jir((/)), Jm{4>)} < J- 

Proof of Proposition^ The converse result in Theorem |2 is proved by showing that the worst-case probability 
of missed detection over the set of distributions given in ([23 1 is lower-bounded regardless of the test used. The 
first condition in Proposition [2] guarantees that these distributions are still in the set Q„ of alternative distributions. 

For the achievability result, the critical step is to show that the rate function is positive for any alternative 
distribution whose likelihood ratio with respect to p is bounded. The second condition in Proposition |2] guarantees 
that K defined in (l^i is positive, which in turn ensures a positive rate function. ■ 

Proof of Proposition ^ The proof is similar to that of Proposition |2] The first condition of Proposition [3] 
ensures that the collection of bi-uniform distributions given in ([23]) used in the proof of the converse result is in 



the set Qn of alternative distributions: For qn defined in ( |23| ) with e replaced by e', for even m, for small enough 
e, we have 

df{qu,p) = ^/(l + 2e') + ^/(l - 2e') > e 



The second condition implies that 



Pj 

3 



Thus, the rate function is positive for any alternative distribution whose likelihood ratio with respect to p is bounded. 



VII. Conclusions and Discussions 

We have shown that the classical error exponent criterion, which appears in the large deviation analysis for 
universal hypothesis testing problems with large number of samples, can be extended to the small sample case, 
provided the normalization is modified to account for both the sample size n and the alphabet size m. 

We offer a few discussions on the results and point out directions for future research: 

1 . The analysis in this paper is of asymptotic nature. The generalized error exponent gives the leading term in 
the asymptotic expansion of the probability of error. Finer approximations are valuable especially for characterizing 
the finite sample performance when n/m is, not very small. For example, finer approximations can reveal the 



difference among the class of tests described in Section IV-A that has the same generalized error exponents. 



2. The size of alphabet m, is used in this and previous work to capture the "complexity" of the hypothesis 
testing problem in the case where the null distribution is uniform. It remains to see how this can be generalized 
to other cases, where the null distribution is far from uniform or has a countably infinite support. A possible 
generalization of the size of alphabet is the Renyi entropy of p, which is equal to log(m) when p is uniform. 

3. It is desirable to establish general large deviation characterizations of separable statistics for small sample 
problems, similar to those established for n x m in ll25l l26l . Such results could provide more insights on how the 
coefficients of a separable statistic affect the test's performance. For example, how the performance of a test with 
test statistic — npjY varies with pi 

4. We have focused on the simple goodness-of-fit problem in this paper, in which p is fully specified. A natural 
extension is the composite goodness-of-fit problem in which p is not fully specified but assumed to be in a known 
set. A similar generalized error exponent concept should exist for the composite case. 

5. There are many other problems for which the approach presented in this paper is relevant. Examples include 
the classification problem ll33l l34l l35l . the problem of testing whether two distributions are close ll36l l37li . and 
probability estimation over a large or unknown alphabet ll38l |39l |401 . 

In the recent work [41 J it is shown how to adapt the methods presented to the classification problem. The 
generalized error exponent analysis is applied to characterize the different ways in which the number of training 
samples and the number of test samples affect the performance of classification algorithms. 

6. Topological structure often contains critical information that is easily ignored in the approaches focused 
on in this work. In particular, in this paper we have not considered any notion of distance between points in 
the alphabet. Other approaches such as the support vector machine, or more recent work such as [42J are based 
primarily on topology. It will be desirable to create a coherent bridge between the approach developed here and 
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topological approaches to hypothesis testing. It is likely that current information-theoretic tools can help to create 
these bridges, such as concepts from lossy source-coding. We are also considering extensions of the work described 
here to the feature selection problem of P3l l44l in which m is interpreted as the number of features rather than 
the alphabet size. 

Organization of the Appendix 

Approximations to the moments of the separable statistics are given in Appendix [Aj and the results are used in 
the rest of the proofs. 

The proofs of Theorem [1] and Theorem [3] are given in Appendix [B] Similar arguments are used in the proofs of 

Theorem [4] and Theorem 5 given in Appendix C] 

The proof of Theorem 5 given in Appendix D] can be read almost independently of Appendix |B] and |C] 

The lemmas supporting the proof of Theorem [6] are given in Appendix |Ej and can be read independently of 

Appendix |Bj |C] and |D1 

Appendix A 
Moments of Separable Statistics 

This section provides a survey of results on asymptotic approximations to moments of separable statistics. The 
results hold for the distributions in the set defined in (fTT]). 



Lemma 6 (Expectation of a separable statistic). Consider the separable statistic Yl%^i fji^^l)- Suppose we have 
maxj < aoe"'°^ for some oq > 0. Then its expectation for v G V„^ is given by: 

m m 2 3 

E.E/. W)] = Y.m+r^Y.^3m)-m) + -^/.d) +/.(2)) 

Proof: We have |/,(3)| = O(^) and 

OO / \ OO ^ Q 

■' \x J m I iog(e"07n/mj| m m-^ 



Consequently, 



E.[X:/,(nr- 



m 

j=i 

+/.(2)Q-|(i-.r-^+o(^3)] 

m 

2 m 3 

+ yE^|(/^(0)-2/.(1) + /.(2))+O(^2)- 
j=i 

Lemma |6] leads to Lemma |2] as well as the following asymptotic approximation of the expectation of S**: 
Lemma 7. For any v G V\^: 



E,[S:] = -n+-(mj;i.|)+0( — ). 
This will be used in the proof of Theorem [1] 
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Lemma 8 (Variance of a separable statistic). Consider a symmetric separable statistic Xlfci fi"^^!)- Suppose 
|/(x)| < aQe°'°^ for some ao > 0. Moreover, suppose /(O) = and f{2) ^ 2/(1). Then its variance for v G 



is given by 



m 2 ™ 

Var,[j;/(nr,-)] = i^(/(2)-2/(l))2(mj;^|)(l + o(l)). 



Lemma [8] is the combination of Equation 2.11 and Equation 2.20 in ll30ll . 

Appendix B 
Proofs of Theorem[T]and Theorem[3] 

The proof of Theorem [T] and Theorem |3] is based on the Chernoff bound and the Gartner-Ellis Theorem. The key 
step is to obtain an asymptotic approximation to the logarithmic moment generating function of the test statistic. 
To simplify the presentation, instead of S* we work with the following statistic: 

m 
3 = 1 

Its logarithmic moment generating is denoted by 

A^^^.(0):=log(E,[exp{^5:}]). (34) 
Asymptotic approximations or bounds to A^ g, {9) are presented in the next two sections. 

A. Approximation to the logarithmic moment generating function for distributions in 

Bounds and approximations for g, are first obtained for the restricted set of distributions P,^ defined in ( [TT] ). 

Proposition 4. For any v G V^, the logarithmic moment generating function for the statistic S* has the following 
asymptotic expansion 

2 ™ 3 

K,s:S^) = - ^) + + ^^^^ 



The approximation errors 0{^) and 0(1) are uniform over the set V^. 
The proof uses the Poissonization technique, and the procedure is applicable for many separable statistics including 
S*: Let {Xj} be a sequence of independent Poisson random variables with parameter Xvj for some A > 0. Then 
for any integers ui, . . . , Um satisfying X]j=i % — hst^ve 

m 

P{nr] = Uj, for all j} = P{Xj = Uj,for all j\ J] Xj = n}. 

Therefore, the moment generating function of a separable statistic Ylf=i /i(^r") admits the following representa- 
tion: 

m mm 

E,[exp{0^fj{nT])}] = E[exp{0 J] J] A, = n]. 

j=i j=i j=i 

It is related to the moment generating function Ax{9) for fi^-^i) follows: 



^AW:=E[exp{0j]/,(A,)}] 



CO ,^ m m 

E ^e~'E[exp{^? J] /,(A,)}| = n] 

n=0 ■ j=l j=l 

°° \ n ™ 

5;-e-^E.[exp{0j]/,(nr^")}]. 

n=0 ■ j=l 
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It follows from the independence of the variables {Xj} that the moment generating function Ax{9) has the following 
formula: _ 



j=l k=0 



Since ^a(^) is analytic in A, the moment generating function of X]j=i /? (^^^j ) t)e obtained via Cauchy's 
theorem: 

E^ieMof^fjinT])}] = ^ I e'A,{9)^, (36) 

where the integration is carried out along any closed contour around A = in the complex plane. These arguments 
lead to the following lemma: 

Lemma 9. The moment generating function of the separable statistic X]j=i fji'^^^'j) given by 

m 

■' j=l k=0 



Proof of Proposition^ Applying Lemma [9] with /,(!) = 1, fj{k) = for /c 7^ 1, we obtain 



E,[eMO{S*n)}] = e''^^. j g{\)d\ (37) 
where 

m 

The rest of the proof is an application of the saddle point method p31. It consists of two steps: The first step is to 
pick a particular contour around A = to carry out the integration. It is desirable to have a contour along which 
g{\) behaves violently: g{X) is large on a small interval on the contour and significantly smaller at the rest, so 
that the value of integral can be approximated by integrating over this small interval. Such a contour can be found, 
by identifying a saddle point of g{\) at which the derivative of g{\) vanishes, and then pick a contour that goes 
through the saddle point. The second step is to apply the Laplace method to estimate the integral along the contour. 

We now apply the first step of the saddle point method: identifying the saddle point and defining the contour for 
integration. Note that the derivative of g is given by 



-5(A)=9(A)E 



To simplify the derivation, we select a point that is close to a saddle point, defined as the solution to 

t'^f-':^y=n. (38) 

i=i 

If A on the left-hand side was taken to be a saddle point, then the right-hand side would be n + 1 instead of n, 
and we will see this error is negligible for our purposes. 

Equation ([38]) has one unique real-valued nonnegative solution, which we denote by Aq. To see this, note that 
when restricting A to [0,oo), the left-hand-side is a continuous and strictly increasing function of A. Moreover, its 
value is when A = 0, increases to 00 when A increases to 00. 

We now obtain an asymptotic expansion of Aq. We first show that Aq = 0{n). When 9 > 0, using the fact that 
< xe-'-' < e"^ and < e^"" < 1 for x > 0, we obtain 

1 - 1 + e^"^ g 

1 + e-i(e^ - 1) - \vj{e<^ - 1) + e^^^ " ^ ' 
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Substituting this into ( [38] ) leads to 
When ^ < 0, we obtain 



ne-^ < Ao < n(l + e"^(e^ - 1)). 
- 1 + e^""^ . 1 



(39) 



< 



\vj{e^ - 1) + e^^^ - 1 + e-i(e^ - 1) ' 



Substituting this into (38 1 leads to 



n(l + e-^{/ - 1)) < Ao < ne"^ 
and e that X^Vj = o(l). Tl 
Aoz^j(e^-l) + e^'"'^ =l + o(l). 



(40) 



It follows from the bounds ( |39l ), ( |40l ) and i/ € that X^Vj = o(l). Thus the demominator of ([38]) satisfies 



Substituting this into (38 1 leads to 



Consequently, 



\vj{e' - 1 + e^"^0 = n{\ + o(l)). 

i=i 



Ao = ne-''(l + o(l)). 
To obtain a refined approximation, let w := X^e^ /n — 1. Consequently, 

Ao = ne~'^{l + w). 



(41) 



An approximation for w will be obtained: Since Aqi^j = 0( — ), we have that the numerator and denominator in 
the summand of (38) satisfy 

2 

Aoi^,(e^ - 1 + e^°'^) = XoVj{e' + X^v, + 0(^)), 



n 



Xoujie^ - 1) + e^""' = 1 + Xouje^ + 0{-^^ , 



Thus, 



E 



Aoz^j(e^-l + e^°'^^ 



^ Aoz^, (e^ - 1) + e^""' 
:5^[Aoz.,e^ + A2z.|(l-e2^) + 0( 



n 



Substituting this and ( |4T| ) into ( [38) leads to 
which gives 



w = n 



5:.|(l-e--)(l + 0(^)) = 0(^). 



(42) 



The integration in (37 1 is now canied out along the closed contour given by A = Aoe*'^ = ne ^(1 + w)e^^: 



E,[exp{0(5:)}] =e 



-en 



2-K 



5(Aoe^^)Aoe*^# 



where 



n! 
"2^ 



/i(^) := e"^"'^ JJ(Aoi/j(e^ - l)e''f' + e^"^^^""). 



(43) 



(44) 
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We now apply the second step of the saddle point method: estimating the integral by the Laplace method. We 
begin with a rough estimate of h{t(j). It follows from Aq = + o(l)) that 

m 2 

j=i ^ 

m 2 

i=i (45) 

m 2 

= e-^exp{j;(Ao^,eVnO(^))} 

2 

= e-^"'^e"exp{-n(l - e'^) + 0( — )}. 

m 

Therefore, for any ip 0, is exponentially smaller than the value of h{ip) at ip = 0. This suggests that the 

integral in (43) can be approximated by integrating over a small interval around ip = 0. Split the integral in (43 1 
into three parts: 

h = Re[[ h{iP)dip], 

J-n/3 

/2 = Re[/ hi'ip)di^], (46) 



/3 = Re[/ 

Jtt/3 

We first estimate h. Denote H{'iIj) = log(/i(^/>)). Simple calculus gives 

m 

H'NA = -,-r, + V Ao^,(e^ - 1)6'^^ + Ao^^,e''^exp{Ao^.,e-n 

i^W zn + z^^ AoJ^,(e^-l)e^^ + exp{Aoz.,e^n ' (47) 

"""^^^ = -g^"P^^°"^"^'^(Ao.,(e^ - l)e^'A + exp{Ao.,e^n)^ 

X (Aoi/j(e^ - l)e''^(l - Aoz^^e'^ + Xy^e^'^) + Aoi^jc'^ exp{Aoi^je'^}). 
It is clear that lm(if(0)) = 0. It follows from (l38) that i/'(0) = 0. Estimates of Re(i7(0)) and are obtained 



from substituting ( [4T] ) and (42i into the expression of -ff(^) and H"{^) and applying asymptotic analysis. In sum, 

Im(i7(0))=0, 

m 3 

Re(/7(0)) = n(l + - ^n^Cj] z.2)(l - e-'') + 0{^), ^^^^ 

2 

H'{0) = 0,H"{i;) = -ne''^ + 0i — ). 

m 

To obtain an upper-bound on Ii, note that for large enough n and for any ip E [— '?i"/3, tt/S], we have Re(-ff"(^)) < 
— 0.4n. It then follows from the mean value theorem that 

Re(iJ(V')) < ^(0) - 0.2n'i{j'^. 

Consequently, for large enough n and m, 



/—n/3 roo rz 

e-O-^^'^^dV < e^(°) / e-°-2"V'=# = e^(°)^^. (49) 
-7r/3 ^-oo v0.2n 
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To obtain a lower-bound on Ii, we begin with an bound on lm{H" (tp)): Since lm{H" {'ip)) = —nsm{^p) + 0{ — ), 
applying |sin('0)| < \ip\, we have that for large enough n, for any V' £ [— '?i"/3, 7r/3], \lm{H"{'ip))\ < l.ln\Tp\. It 
also follows from ( |48| ) that Ke{H" {ip)) > — l.ln. Applying the mean value theorem, we conclude that there exists 
some c > such that for ip G [— 7r/3, 7r/3], 

Re{H{iP)) > H{0) - l.lnV'^ 

2 

I lm(H(^))\ < l.ln\i;f + c—ip'^. 

m 

Use the short-hand notation t„ = OAmin{r^^^^,^/m/{^/cn)}. For i/; G [— tn,in]> we have cos(Im(ff('i/'))) > 0.5, 
and thus Re(e^^'^^) > 0.5e^^^^*^'^^^. The integration for Ii is further split into three parts: 



Ii =Re[ / e^'^^UiP] + Re[ / e^^'^^dV] + M 



7r/3 



7r/3 



The absolute value of the first term is upper-bounded as follows: 



7r/3 



-0.2ntlil)' 



dip 



< tne 



H{0) 



-1 



1 



1 



The second term is bounded in a similar way. The third term is lower-bounded as follows: 



Re I 



'l.lnil)' 



dip 



(50) 



>0.5e^W| 



dip] 



>0.5e^(°)(^ + 0(^)) = 0.5e^(°)^(l + o(l)). 
Vl.ln ?^^n VI. In 

where the last inequality follows from an argument similar to ( |50l ). Combining these bounds together, we obtain 



h > Rel 



^^(^)d^]-\Re[ " e^^^UiP]\ 



Rel 



-tt/S 



tt/S 



^^(O)OV^ 



Combing this and ( 49 1 leads to. 



gO(l) ^ gn{l+o{l)) 



=o{i) 



n 



(51) 



where the last equality follows from the estimate of ^^(0) given in (48l and (42 1. 

We now estimate I2 and I3. For ip G [— vr, — 7r/3]U[7r/3, vr], we obtain from ( [45] ) that < exp{0.5n+O(^)}, 

which implies Re[/2] + Re[/3] = ©(e*^'^"). This shows that I2 and I3 are much smaller than Ii. Thus, the integral 
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in ( [43] ) can be approximated by the estimate of Ii: Substituting ( [5T] ) and ( [48] ) into ( [43] ), we obtain 

= £^Ao"e-^"/i(l + o(l)) 

= -Ao-'^e-^"e^(o)^eO(i)(l + o(l)) 



(1 + n V ^1(1 - + 0(4))"'^ 



m 3 

exp{ln2E.^)(l-e-^^) + 0(^)}e««) 

in ™ 3 



StirUng formula gives ~ ^ + ^(n)- "^^^ claim of the proposition is obtained on taking logarithm on both 

sides. ■ 

B. Approximation to the logarithmic moment generating function for distributions not in 

We also need to consider distributions in Q„ \ V^. For any q ^ Qn\ V^, the set of indices Sq := {j G [m] : 
Qj > 7m~^} is non-empty. Now fix a small constant t] > 0, and consider each index j in Sq in two separate cases, 
according to whether nqj > rj. Denote 

= {j ■■ nqj > V], = Qj- 

Proposition |5] below addresses the case where /3(g) is large. It implies that the probability of missed detection 
associated with such a distribution is much smaller than that associated with the worst-case distributions: The 
probability decays exponentially fast with respect to n, which is larger than n^/m. Proposition [6] considers the 
alternate case, and shows that if /3(g) is not large, then a bound similar to that in Proposition |4] holds. 

Proposition 5. For all sufficiently small ij > 0, any 9 £ (0, 0.5], and any f3 > 0, there exists uq such that for any 
n > no, and any v satisfying > /3, the following holds, 

A^^S,{9)<-f3{u)a{9)n, 

where a{9) > 0. 

Proposition 6. For any 5 > 0, 9 £ (0,0.5], f] > 0, there exist rj G (0,r/), /? > 0, and uq such that for any n > no, 
and any u satisfying /3{i') < (3, the following holds, 

The proofs of Proposition |5] and Proposition [6] use steps similar to those leading to the upper-bound in Propo- 
sition [4] However, the approximation given by ( |4T| ) and ( 42 1 is no longer valid, so a different approximation is 



required. The conclusions on the existence and uniqueness of the solution Ao and the bounds in ( |39) are still valid, 
and our proof begins from there. 

To simplify the presentation, we use the following notation similar to the small "o" notation: We write x = o^{l) 
whenever there exists a function s{r]) that does not depend on 9, n, and v, such that < s{7]) and lim^_j.o s{r]) = 0. 

Consider any t] and v. Write W,, = W,,(i^). For any j ^ Wrj, we obtain the expansion of the summand in 
via the mean value theorem: 

'°"''"°-' + ^'""'-A„.,.» + AW(l-e-)(l+„-.(l)). 



Ao^'j(e^ - 1) + e 
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For any j G W,,, the following equality holds: 
where 



Dj ■■= z ^ r^^i^ — ;V > e" (52) 

^ 1 + Xoi^je-^'"'^ (e^ - 1) - 



Substituting these estimates into ([38]) leads to 

Ao(l + MDj - l))e' + J] ^1(1 - e^yi + o^(l)) = n. (53) 

iew„ j^w. 

Applying Ao Ej^w,, ^| < ^ Sj^w, < ^ gi^es, 



Introducing a variable w as before, 



TIC 



On substituting (54i into (53 1, we obtain 



n( 

^ = n r^^^^ — + '''(I)) = ^'(1)- ^^^^ 



In the proofs of both propositions, we integrate ( |T7] ) along the closed contour corresponding to A = Aqc*^ from 
tp = —IT to t/j = TT, and use the same definition of h{ip) given in (44) and {{{tjj) = log{h{ip)). The integral is given 
in ( |43] ) and our task is to estimate it. We now give the details. 
Proof of Proposition^ We first show that any 

Re(i3-(V)) < ^^(0) = Y^^Xoiyj + log(l + Xou,e-^'"'^{e' - 1))]. (56) 
j 

Thus to bound the integral in ( |43| ), we only need to bound H{0). For [— ^tTj^tt], the summand in the expression 
of Re(ff(^)) given in ( [47] ) is bounded as follows: 

Re[log(Aoz^,(e^ - l)e''^ + e^"^^""")] 

= Re[log(e^°'^^""') + log(l + \oiyj{e^ - l)e^'^e-^'"^^^"')] (57) 
< Xoi^j cosV' + log(l + Xouje-^"''^ """"^{e^ - 1)). 

The right-hand side is a convex function of cosV' for ^ G [~5^> s"^]- Thus, it achieves its maximum value at 
cos^p = 1 or cosV' = 0. Note that its value at cosV' = 1 is exactly equal to the summand in H{0). Moreover, we 
can show that its value at cos ^ = 1 is no smaller than its value at cos ip = 0: 

Xoi^j + log(l + Xoi^jie^ - l)e-^'"'^) - log(l + Aoz^j(e^ - 1)) 



where the inequaUty follows from ^ > 0. This leads to ([56]) for i/' G [— ^vr, ^vr]. 
For ijj G [— vr, —^tt] U [^vt, vr], we have |e'*'"'^^'^"'' | < 1. Consequently, 

\Xoiyj{e^ - l)e''^ + e^""''"'] < 1 + XoUj{e^ - 1), 

which leads to 

Re[log(Aoi/j(e^ - l)e'^ + e'^'"'^'"')] < log(l + XoUj{e' - 1)). (58) 
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The right-hand side of ( [58) is equal to the value of the right-hand side of ( pTj ) at cos = 0, which has been shown 
in the previous paragraph to be smaller than H{0). This leads to (56l for ^ G [— vr, — ^vr] U [^vr, vr]. 
We now approximate the right-hand side of ( |56l ): For j ^ Wry, we have 

Aoi^i + log(l + Xoi^je-^^^^ie' - 1)) = Aoz^f + ^A^z^Kl - e^^l + (."(l)). 

For j G W,;, we have the inequality 

Xoi^j + log(l + Xoiyje-^""^ {e'^ - 1)) < Xouje^ + XoUj{l - e'^^^'Xl - e^). 



Substituting these two estimates, and ( [54] ), ( [56[ ) into ( |43] ) leads to 

E,[exp{^(5:)}] < |lAo"e-^"exp{F(0)} (59) 
< n!Ao "e-^" exp{ [Aoz^.e^ + ^Agz.|(l - e^^l + o'?(l))]} 

X exp{ [Ao^.e^ + Aoz^,(l - e~^«^0(l " e')]} 



n\e 



1+ ^^.(Z), -l))'^(l + 



n 

jew, 

, ^(1 + + E,ew, - e~^'>^^){e-' - 1) 

< — exp{-nlog(l + ») + i + 

XErtw.'-|(l-<^-'"')(l + °''(l)) „ 

X exp{„[ •'AD, - 1) - 1 + '^^frv^^''";;7''ir"'' ''- 

We now bound each exponential term on the right-hand side of ( [60| ). Applying ( [55] ) and the lower-bound on Dj in 
( 52 1 gives the following bound on the second term: 



< -|e-2^nu;(l + o^(l)). (61) 



The first exponential term satisfies 



TIW 

- n\og{l + w) + — - = -nwJ^{l), (62) 

which follows from (52i and w = o'^{\). Combining ( [6T] ) and ([62]) implies that for small enough rj, the sum of the 
first and second term is negative. 
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The exponent in the last term on the right-hand side of ( [60| ) is bounded as follows: 



1 -1 + - 



11 



< 



<- 



(63) 



1)] 



1 + E,ew,^.(^.-1) 

where the first inequality follows from Jensen's inequality and the second follows from J2j^w — ^■ 

We first bound the summand in the numerator on the right-hand side of (63 1. Consider any j G W,,. Let x := XqUj. 
Applying the formula of Dj in ([52]) gives 



(D, - If + {I - e-^ie- 



1) 



+ e- 



1) + (xe-"(e^-l))' 



(64) 



[l + xe-^(e^ - 1)) 

Let t{x) = (1 - e-'')(e-^ - 1) + (xe-^(e^ - 1)) . Note that j G implies nuj > r/, which combined with ([39] 
implies x = Xqi^j > 7]e~^. Since for 9 G (0, 0.5], t{x) is strictly decreasing on [0, oo), we obtain t{x) < t{rje^^) < 
Substituting this into (64i and using the elementary fact that 



+ e- 



(1 + 



xe 



1))' 



<e- 



-35 



{Dj - If + (1 - e-^)(e-' - 1) < -e-^H{rie 



we obtain 



The denominator of on the right-hand side of ( 163) is positive and upper-bounded by 1 because Dj < 1. Combining 
the bounds on the numerator and denominator gives a bound on the exponent in the last term on the right-hand 



side of (60 1 



1 + E,ew,'^.(^.-1) 



< -I3{j^)aie) < 0, 



(65) 



where 



a{e) 



-36* I 



(1 - e-'"''){e-' - 1) + {ve-'e-^''\e' - I))' 



Combining (61 1, (62i and ([65]) and using the fact that the right-hand sides of m^ (62i are negative, we obtain: 



E,[exp{0(5:)}] < 



me 



-V2TTn exp{—nf3{i')a{9)} . 



Taking the logarithm on both side and applying Stirling's formula leads to 

^v,si (0) < -nP{v)a{e) + \ log(2^n) + 0( 



1. 



n 



Since > /3, the second term ^ log(27rn) becomes negligible comparing to the first term for large n. This leads 
to the claim of the proposition. ■ 
Proof of Proposition^ We pick /3 so that /3 = o'^{\). It then follows that 

E v,{D,-l) = o\l) 



(66) 
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Substituting this into ( [54| ) and ( [55) gives 

Ao = ne-\l + o\l)), w = n{Yl - e-^'){l + (."(l)). (67) 

The rest of the proof is similar to the proof of Proposition [4[ Applying ([56]) to j € W,,, we obtain 
IMV) <|e"'"'^ n (Ao^^j(e'-l)e*'^ + e^""^'")l H exp{Aoi^,- + log(l + AoJ^je-^"^^ (e^ - 1))} 

<|e-^"'^|exp{( J]] Aoi/je^cos'0(l + o''(l))) + ^Qi^ie^} (68) 

i^w^ jew, 
=e" exp{-n(l - cos + o''(l))}. 



It is clear from ( [68] ) that the integrand is large at the interval around 0. Thus, we again split the integral in ( [43) 
into three parts Ii, I2 and I^, as in (46 1. We will show later that I2 and I3 are much smaller than Ii. 
We first upper-bound Ii. Similar to ([48), we have 



Im(/?(0)) = 0,Re(iJ'(0)) = 0,Im(F'(0)) = 0. 

We now estimate H"{ip), whose exactly formula is given in ( [47) . Consider j £ Wr^. For Tp G [— vr/3, 7r/3], we have 
the following inequality: 

|1 + Aoz^j(e^ - l)e*'^exp{-Aoz^je*'^}| > 1, 

\XoUj{e^ - l)e'^{l - Xoiyje'^ + Agi/Je^^^) exp{-Aoz^ie^'^} + Xoiyje'^\<WOXoiyje^ . 

Substituting these into ([47), we obtain \H"{'ip)\ < 100^n(l + o^{l)) = no''(l). Substituting this and the estimate 
( [67) into the expression of H"{ip) leads to 

Note that the assumption of the proposition allows us to take very small 7]. We choose it small enough so that the 
term o^(l) in the above equation is smaller than 0.05. Then for large enough n, for any G [— vr/3, vr/S], we have 
Re(i?"(V')) < — 0.4n. It follows from the mean value theorem that 

Re{H{ij)) < H{0) - 0.2nip'^. 



Consequently, for large enough n and m, we have 

J-n/3 J- 



e-°-^^^d^ = e^(0)^^. (69) 
/3 J-00 v0.4n 



We now bound the tails I2 and Is. For G [— vr, — 7r/3]U[7r/3, vr], we obtain from (68) that 1/1(^^)1 < exp{0.5n(l+ 
0^(1))}. Thus, for small enough rj, we have 

Re[/2]+Re[/3] = 0(e°-'"). 
Substituting the estimate for Ii, I2 and into ([43) gives 



E.[exp{0(^:)}] < ^^Ao"e-^'^e^(°)(l + o(l)). 
V l.onvr 



Note that the right-hand side is almost the same as the right-hand side of ( [59) except for the multiplication term 
Vi.6np (^ ~'~ '^(^))- Thus, we can bound it using the right-hand side of (60) after taking into account this additional 
multiplication term. We obtain 

n!e" h^^Ei^w z^Kl-e-2^)(l+oni)) 



Substituting ( [66) and Stirling's formula into the right-hand side of the above inequality leads to 

E,[exp{e(5:-n)}] < ^exp{-in2( Y,y]){l-e-^'){l+o^m{l+o{l)). 

Taking logarithm on both sides gives the claim of this proposition. 
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C. Proof of Theorem [7] and Theorem |J] 

Proof of Theorem^ Let Aq{9) be the limit of the logarithmic moment generating function of A^(„) 

777 

Aqie) := lim —A .,{9). 
It follows from Proposition |4] that the limit exists and is given by the following function: 

Denote its Fenchel-Legendre transformation 

A*q{t) := snp[9t - Aq{e)]. 
e 

It follows from the Gartner-Ellis Theorem B6l Theorem 2.3.6] that 



^log(Pg( 



-limsup^log(P,(„,{5: < Ep[5:] + -r}) 



lim sup ^ log(Pg(„){5; > -Ep[S^] -n r}) 



inf Al(t)=At(-r-l) 

i> — r— 1 

:sup{0(-l-r)-i(e-2^-l)^(q)}. 
6»0 



where — r — 1 is the normalized limit of — EpfS**] — n — by Lemma |7| ■ 
Proof of Theorem [I]- The proof for the result on the generalized error exponent of false alarm Jp {(f)* ) is very 
similar to that of Theorem |3| Let Aq{9) be the limit of the logarithmic moment generating function of A^^, : 



Ao{9) := lim ^A 5.(0). 

It follows from Proposition |4] that the limit exists and is given by the following function: 

Ao(^) = i(e-2^-l). 
Let Al{t) = sup0[9t - Ao{9)]. It follows from the Gartner-ElUs Theorem that 



771 

limsup log(Pp((/); = 1)) 

n— >oo IT' 

2 

TTl ~ Ti 

limsup log(Pp{S'; < -Ep[S'.*] - n r}) 



= inf Al[t)=Al{-T-\) 

r< — T— 1 

= sup{0(-r - 1) - i(e-2^ - 1)} = rp{T). 

B 

For the result on the generalized error exponent of missed detection Jm{(I^*), we prove an upper-bound and a 
lower- bound. For the upper-bound, consider the sequence of distributions given in (2]_i and (22i and let q* denote 
this sequence. The rate function associated with q* satisfies 

Jq^{4>\T) = Jl,{T). 

On the other hand, since each element of q* is in the set of alternative distributions, it follows from the definition 

of JM{(t>*) and Jq,{(j)*,T) that 

JM{(t>*) < Jq'ir,r) 

To obtain the lower-bound on Jm {(!>*), we apply Proposition |5] and Proposition [6] . We only need to prove it for 
the case r G [0, The case r = will then follow from a continuity argument. 
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Take to be the maximizer in the optimization problem defining J%j{t) (see ([T6]l). It is not difficult to see that 
6q > 0. It follows from Lemma [T] that 

m g|>(l + K([^|M))(l-/?(g))(l + o(l)). 

Thus, for any (5 > 0, we can choose r],f3o small enough so that for any q £ Qn satisfying f3{q) < /3o, we have 
"^Sj^w l] — +^(^))(1 ~ It then follows from Proposition that for large enough n, 



\5,,(^o) < + K(e))(e-2^« - l)(l -5)2 + 0(1). (70) 

For q satisfying P{q) > Pq, it follows from Proposition [S] that for large enough n, 

\s:i^o)<-f3oa{do)n. (71) 

We can pick n large enough so that the right-hand side of ^TT\ is smaller than the right-hand side of ( |70| ). Applying 
the Chernoff bound leads to 

log(sup P,i<Pl = 0)) 

<-0o(Ep[S;]-r„)+ sup A ^.(^o) 

geQ„ 

<eoiTn-E,[S:])+'~{l + K{e)){e-''" - l){l-6f + 0{l). 

Thus, 

Jm(</'*) > 0o{-l - r) - l{e-^'° - 1)(1 + /i(e))(l - 6f. 
This holds for any 5 > 0. Consequently, Jm{4>*) > Jm{t)- ■ 

Appendix C 
Proofs of Theorem|4]and Theorem[5] 

A. Proof of Theorem [?] 

The performance of 0*+ is analyzed by connecting it to the performance of (ff . We first show that its probability 
of missed detection is no larger than that of cj)* . We then apply a result similar to Proposition |4] to analyze its 
probability of false alarm. Consider the statistic 



2 



0„ — ~0>n 



n. 



Define 

K,S'A^) ■= log(E,[exp{0(5:+)}]). (72) 

J' c /-''' 

asymptotic expansion 



Proposition 7. For any v S V^, the logarithmic moment generating function for the statistic S**"*" has the following 



2 Ta 3 



\s,AO) =-{mY.-'){-(^ + - (1 - 2^)]} + 0{—,) + 0(1). (73) 



Proof of Proposition [7[ The proof follows exactly the same step as that of Proposition |4] except some of the 
approximations are different. We now only describe the key steps and highlight the difference: First, the estimate 



of the saddle point is the same as ( |4T) and ( |42| ). Second, different from ( |43| ), we have the following expression of 
the moment generating function: 



E,[exp{0(5:+)}] = |lAo"e-^"Re[ |" h^^. 
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where instead of (|44]l, 



h{^) ■= e-'^^\l{X^v,{e' - l)e'^ + e^°^^-'" + J] i^^(e^'" - 1)). 

j=l 1=2 

It follows from Aq = n~^(l + o(l)) that the last term is negligible when V2 = ^ and I < oo. 



Z=2 



The asymptotic approximation of /i(V') is the same as that in (45 1: 



Finally, the approximations of H (0) , H' {0) , H" {ip) are the same as in ( |48] ). Therefore, A^^.+ has the same 
asymptotic approximation as that of A^^. up to an approximation error of O(^). ■ 
Proof of Theorem [?]• Since ti; > for / > 2, we have 



Thus, for the same sequence of thresholds f„, we have 

Pg{S:+ < fn] < Pg{S: < f„} 



On the other hand, since Aj^^.+ has the same asymptotic approximation as that of A^^. up to an approximation 
error of O(^), we have 



logPp{5:+ >-n + f„} 
= logPp{5r <-f„} 
<e(-f„)+Ap^^.+ (-0) 

fn + ^{9 + \[e'' - (1 + 2^)]) + O(^) + 0(1). 



which is the same bound as that for log Pp{S^ > —n + fn}. ■ 
B. Proof of Theorem |5] 

The proof of Theorem [5] follows exactly the same steps as those in the proof of Theorem [T] We use Proposition [8j 
Proposition |9] and Proposition 10 in place of Proposition |4j Proposition |5] and Proposition [6] 



Denote 



A,,5wW :=log(E4exp{&5™}]). 
Proposition 8. For any v G V^, the logarithmic moment generating function for the statistic has the following 



asymptotic expansion 



A.,5w W =h'iT.(p^ - + h\Y. - (1 + ^)] + o(4) + 0(1). 

Proposition 9. For all sufficiently small rj > 0, any 9 £ [—1, 0) and any /? > 0. There exists no such that for any 
n > no, and any v satisfying (i{v) < (3, the following holds, 

A,,5w(e) < -(3{q)a'{e)n 

where a' (9) > for 9 G [-1,0). 
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Proposition 10. For any 6 > 0, 9 & [— 1,0), r/ > 0, there exists r] £ (0,r/), /? > 0, and tiq such that for any 
n > no, and any u satisfying (3{q) < (3, the following holds, 

A.,5w(0<^[(m {P3-^3?)(^ + \{^ E -{i + emi-5). 

We only outline the proof for Proposition [8] 

Proof of Proposition [S[ The steps are the same as thos in the proof of Proposition [4] Again, we describe the 
main steps and highlight the difference. First, the estimate of the saddle point is different than that in ( [4T] ) and 
( |42l ). We have 

Ao = n(l + w), 

w = n( J] - 5] u^{e' - 1))(1 + 0(^)). 



Second, different from ( |43| ), we have the following expression of the moment generating function: 

E:[exp{9S^}] = £[Ao"Re[ |" hi^^J)d^l^] 

where 

m 

2 

-"'^expW'^ + Of — )}, 
m 



Finally, the approximation of Re(-fr(0)) is different from that in ( |48| ) 

.3 



Re{H{0))=n{l+w) + ln\Y{pj-ujf)e+ln\Yu]){e'-l-e)+O{ 



n 



m 

i=i 3=1 



The rest of the steps are the same as those in Proposition |4] ■ 
Proof of Theorem^ We first prove the lower-bound on Jp{(f>^). Substituting the asymptotic approximation 
of Ap 5w (9) given in Proposition [S] into the Chernoff bound, we obtain that for ^ > 0, 

iogPp(0:f = 1) 

<-9Tn + Ap,S'^{9) 

m 3 

= -9Tn + n2( J]p|)l[e^ - (1 + 9)] + O(^) + 0(1). 

3=1 

Since TnY2T=iP^j ^ 7^' which is a consequence of Assumption pi we have 



Jpi^"^) > sup{It9 - \^^[e' - (1 + 6)]} > 0. 



Lower-bounding Jj\/((/>^) requires us to obtain a uniform bound on the probability Pq{(j)n = 0) over q G Qn- 
We apply Proposition |9] and Proposition 10 Using an argument similar to the proof in Theorem [T] we conclude 
that for any 5 > 0, and 9 G (0, 1], for large enough n, 

<9Tn+Ag,S'^{-9) 

2 m m 

=0rn~[h9rnY{q,-p,f-UrnYq]){e-'-{l-9))]{l-6). 

j=l 3=1 
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We need to upper-bound the right-hand side uniformly over all q G Qn- Using the inequalities Qj < 2p'j+2{pj — qj) 
and e-^ - {I - e) < ^9'^ for e > 0, we obtain 



n 

j=i j=i j=i 



= ^^e[-{m^{q, - )(1 -9) + 9{m^p])]{l -6) + 9^ + 0(1). 
j=i i=i 

Applying rnY^Y=iiQj ~ Pj)'^ ^ and mY^J'^^pj < 7^ leads to, 

^ log[PM(</>r)] < - ^) + ^7'](1 - 5) + ^ + 0(1)- 

Taking 9 = {Ae'^{l - 6) - 2r)/[(8e2 + 2j'^){l - 6)], and taking the limit on both sides gives 

vv^ > 1,,2 4£^(l-^)-2r 
'^^■^('^ ) - 4^" (8e2 + 272)(l-5)- 

Since this holds for all 6 > 0, and 2t < 4e^, we conclude that 



Appendix D 
Proof of Theorem[2] 

We first give an outline of the proof: Consider any r G [0,K(e)]. Given 6 > 0, a. sequence of events {Bn^T,5} is 
constructed so that the following is satisfied: 

(i) The probability of the event is close to the probability of false alarm: 

tn 

limsup ^ log{P p{Bn,r,5)) < JKr) - S. (74) 

n— >oo 'T- 

(ii) For any satisfying {Z" = z^} C Bn^r,s^ the following uniform bound on the likelihood ratio holds: 

^2 

sup ^(z?) > exp{ {JUt) - JUr) + 5)}. (75) 

The lower- bound on Pm is then obtained from the following inequality: 

PMi(l)n) > sup Pg({(An = 0} H Bn,r,5) 

> sup ^ ({(/>« = 0} n Bn,r,s)Pp{{<Pn = 0} D Bn,r,s) (76) 

n 

> sup \ ({(/>„ = 0} n Bn,r,5)(Pp(5n,r,5) " Pp({0n = !}))• 



The first term on the right-hand side is lower-bounded in ( fTS] ). The second term can be shown to have the same 
large deviations limit as that of Pp{Bn^T,5)' 

Ppi{<Pn = 0} n Bn,r,s) > Pp{Bn,r,5) " Pp{{<Pn = 1}) (77) 

The inequality in ([74]) ensures that Pp({0„ = 1}) is negligible comparing to Pp(-Bn,T,<5)- 

The technique of using uniform lower-bounds on likelihood ratio (LR) to prove lower-bounds of probability of 
missed detection has been applied in ||5l [H : In this prior work, a uniform bound on LR is obtained over all possible 
z". To prove the tight hardness result as in Theorem [2j we require the bound on LR to hold uniformly for the 
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sequences in the event Bn, instead of all sequences. This gives us the freedom to optimize i?„ to obtain the tightest 
bound. 



The technique to prove ( |75) has been previously used in providing hardness results for composite and hypothesis 
testing problems ||5l|3j|35l. First, construct a collection of distributions so that for each distribution q, the likelihood 
ratio q/p has a simple expression. Second, show that for all observations z" := {zi, . . . , 2;„} in the event Bn, the 
average of Pg{Z'^ = z"}/Pp{Z" = z"} over the collection of distributions can be lower-bounded, which in 
turn lower-bounds the left-hand side of ( [75] ). The proof for e < 0.5 and e > 0.5 uses different constructions of 
distributions. 

We now carry out these two steps: Construction of the event Bn,T,s and lower-bounding the likelihood ratio. 



A. Construction of Bn^T,5 
Define the event 

771 2 ™ 2 

The probability of the event Bn^T,5 has the following asymptotic approximation: 
Lemma 10. For r = and any 6 > 0, 

lim Pp{Bn,r,5) = 1. (79) 
n— >cxD 

For any r, 5 satisfying t > 5 > 0, 

777 

lim 2 log Pp{Bn,r,s) = Mr - S). (80) 

n— >-oo n 

Proof of Lemma 10- First consider the case where r = 0. Applying Theorem [T] with r replaced by 6 gives 

m 2 

Pp{Y,HriT] = 1} < n - (1 + 5)^} = 1 - o(l). (81) 

i=i 

The following asymptotic approximations on the expectation and variance of the statistic Y^Y=i = 2} follows 

from Lemma [6] and Lemma M 

m 9 

n 



E,[5]l{nr5^ = 2}] = i-(l + o(l)), 

m 2 

Var,El{nr5^ = 2}] = i^(l + o(l)). 
Applying Chebyshev's inequality leads to 



m 



The claim of this lemma for r = follows from combining this inequality with (81 



Next consider the case where r > 0. We first obtain a large deviations characterization of 

m 

5(2) .-^i{nT] = 2} 
i=i 

by deriving an approximation to the logarithmic moment generating function. The steps are the same as those in 
the proof of Proposition [4] Again, we describe the main steps and highlight the difference. First, the estimate of 
the saddle point is different than that in (41 1 and (|42]). We have 



Ao = n(l + w), 
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Second, different from ( |43] ), we have the following expression of the moment generating function: 
where 



Finally, the approximation of Re(i?(0)) is different from that in ( [48] ) 

m 

Re{H{0)) =n(l + w) + ^n^iY^ y'-){e^ - 1) + 0( 
The rest of the steps are the same as those in Proposition |4] We obtain 

2 ™ 3 



Applying the same steps as those for the characterization of Jp{(j)*) in Theorem [TJ we have 

m 2 

lim -^logPp{yi{nr" = 2} > Ul + T-5) — ] = rp{T-5). 

n— >oo n ^ ^ — ' 777, 

i=i 

Applying Theorem [T] with r replaced by r + (5, we obtain 



m 2 

lim -"llogPpI Vl{r7r" = l} <n-(l + r + 5) — } = J]^(t + 5). 

n— >oo n ^ — ' 771 

i=i 

Note that J^(t+(5) > Jp{T — d). Thus the probability that the first constraint in the definition of Bn,T,5 is violated 
is negligible comparing to the probability that the second constraint is satisfied. This shows that the probability of 
Bn^T,s can be approximated by the probability that the second constraint in the definition of Bn,T,s is satisfied. This 
leads to the claim of the lemma. 



B. A lower-bound on the likelihood ratio for e > 0.5 

When e > 0.5, we use the following construction of distributions: Let Um denote the collection of all subsets of 
[m] whose cardinality is [r77,(l — e)J. For each U G Um, define the distribution 



Lm(l-e)J ' ^ 

0, j€[m]\U. 



Consider the mixture q^ = r X^wet/ 1u- The following bound on (p/p^ holds: 



Lemma 11. Suppose e > 0.5. For any sequence z" = {zi, . . . , Zn} satisfying {Z" = z"} C Bn^T,5> the following 
holds: -n 2 3 

log(^(z?)) > - log(l + ^{e))il + T-5)] + O(^). 



Proof of Lemma II' Let S := {j : j appears in z"}. Let s = \S\. It follows from {Z" = z"} C Bn^T,5 that 

n-l — (l + T + 36)<s<n-^ — (l + T-6). (83) 

^ TTT, 77T, 



The likeUhood ratio p has the expression: p^(z^) = ( im{T-e)\ T'^SQK- Thus 

JJ^ )n(J_ V- 



ueu„ 
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where 

/ m—s \ 
V[m(l-e)J-s/ 

Stirling's formula gives 



%Sh^ = (Ml^). exp{4^-^ + 0(4)}(1 + 0(1)). 

V[m(l— £)J/ 

Substituting this into ([84]) leads to 

The claim of this lemma follows from applying the inequality (83 1 and the fact that K{e) = when e > 0.5. ■ 

C. A lower-bound on the likelihood ratio for e < 0.5 

When e < 0.5, we use the following construction of distributions: Let Um denote the collection of all subsets of 
[m] whose cardinality is \m/2\. For each set U G Um, define the distribution qu as 



This collection of distributions can be obtained by taking the worst-case distribution q* given in ( |2T| ), and permuting 
the symbols in the alphabet [m]. 

Let qj^ be the n-order product of qu. Define the following mixture distribution, 



\Um\ 



The LR can be lower-bounded on Bn,r,5 '■ 

Lemma 12. Suppose e < 0.5. The following holds for any sequence satisfying {Z" = z"} C Bn^r,5' 
log(^(^?))>- — [K(e)-log(l + /i(.))(l+r-<5)](l+o(l)) 2^1og(l-2.). 



Proof of Lemma \T2j For simplicity of exposition we restrict to the case where m is even. Define 

Si:={j: j appears in z" exactly once}, 
S2-={j'- j appears in exactly twice}. 
Let si = \Si\, S2 = \S2\. It follows from {Z" = z"} C Bn^^-s that 



71^ TtP' 

n>si>n {l + T + 6), S2 > i — (l + T-(5). (85) 

m ^ m 



Consider any set U G Um- Let ky^i = |^ n and kK^2 = |^ H 52|. Then 

p(zn>(l-25)"(i±|)'=-+^'=-. 



Consequently, 



\(^?)>G(5i,S2) (86) 



where 



G(.i, S2) : = ^(1 - 2£)" E E (([^)'([^)''1{^ G f/m : ku,l = fcl, fci,,2 = k2}\) 



fcl=l fc2=l 
Si S2 



^a-^^rEE((i^.'([^)-(:;)(::)(, 



m - (si + S2) 
m/2 - {ki + k2) 



(87) 



33 



The summand on the right-hand side of ( |87] ) takes its maximum value approximately when 



(88) 



We apply the Laplace method to approximate the summation: Denote 



y(Ai,A2) = ( 



1 - 2e' 
Stirling's formula gives 

m-{si+S2) 
f -(fci + Ai + ^2+A2; 



S2 



m- {si+ S2) 



h + AiJ \k2 + A2j \m/2- (/C1 + A1 + /C2 + Aa)/' \m/2 



m 



m-{si+S2)\ 

-Ch+hy 



.exp{l + 0((^l^^l(^)+o(l)}. 



m 



(89) 



Let 



yi(Ai 



si 



l-2e' \ki + AiJ' \kij' 

S2 



\-2e' \k2 + A2j\k 

Note that yiki,k2) is the largest summand. Keeping only the [y^] [y^] number of terms in the summation in 
( [86l ) whose index {ki,k2) is close to (^1,^2), and applying ([89), we obtain 



p" 



^i=-rv^i A2=-rv^i 

( J] yi(Ai))( Yl y2(A2))y(0,0)exp{l+O(^)}. 

Ai=-rv^ A2=-rv^i 



(90) 



We first approximate X^a^-T^] ^1(^1)- Note that for Ai > 0, 



log(yi(Ai)) = Ai log(^) + i:iog(^^^). 



Approximating the above summation by integrals leads to 



log(yi(Ai)) = -U^-T + + '^(^^^ + ^(1)- 

S\ rvl rvl 



Approximating the summation over Ai using integrals, and applying the above approximation of yi{Ai) leads to 
^2\i(A0 = e«(Y°°e-^(^+i)^^dAi = e«(^\/^^i^M^ 

r , 1 J -00 V 



where the last equality follows from ([88]). A similar approximation for the summation over 1/2 holds: 

y2(A2) = eO(i)V^. 

A2=-rv^i 



Substituting these into ( [87] ) gives 

Gisu S2)=e<'^'>^r-^^^2{l-2en'-±^f^{'-±^r^ f''^ f''^ ( m-{s, + S2) 



l-2e' 'l-2e' \kij \k2j \m/2 - {ki + k2)j \m/2 



m 



(91) 
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Stirling's formula gives the following asymptotic approximations the combinatorial terms in (|9T 

'si\ (l + 2e)-^i(l-2e)^i-*i2^i 



-(1 + 0(1)), 



S2\ _ (1 + 2e)-2fc2(i _ 2e)2(fc2-s2)(i _^4£2)^.2^^ 
y/27rk2{s2 - k2)/s2 

^ -2™— exp{-^(l + o(l))}^(l + o(l)), 



1 + 0(1)), 



m/2-{ki + k2) 
2™ 



2m 



m 



(l + o(l)). 



^7n/2/ ^vrm 

Substituting these approximations and the value of ki and k2 into (91) leads to 

„2/oA2 ^3/2 

m 



<,2C2A2 3/2 

G(si,S2)=(l-2e)— 2s.^^p|_^L^(-L^^(-L))^^^l^g(^^4^2)|g^p{Q(-^)^Q( 

2m — 



Combining this with ([85), ([86]l gives the claim of the lemma. I 
D. Proof of Theorem [2] 

Proof: Consider first the case r > 0. Consider any 5 £ (0, r), and any test (f) such that Jf{4>) ^ J pi''') 
Applying ( [77] ) and Lemma 10 we obtain 



lim 



m 



log Pp({(/.„ = 0} n = J^(r - 6). 



When e > 0.5, we apply (|76]l, ( |92| ), and Lemma [TT] to obtain 

JaM < l[K{e) -log{l + K{e)){l + T- 6)] + JUt- 6) 
= Jl^{r-6)+r2{6). 

where r2 again vanishes as (5 — )• 0, 

6 



(92) 



(93) 



r2(5) =l[-(51og(l + /^(£)) + (l + T)log(l ^^^^ 



51og(l + r-(^) + 5]. 



We have used the following explicit expressions of Jp and J|^: 



^f(t) 



-T + (l + r)log(l + r)], 



JM(T) = l[^(e)-T + (l + r)log( 



1+r 
1 + 



)]• 



Since (93 1 holds for any 6>0 and J%i{t) is continuous, we conclude Ji\i{4>)<JI,j{t). 
When e < 0.5, we apply (|76]l, ([92]), and Lemma [T2] to obtain 

^m(</') <l[At(e)-log(l+/i(e))(l+r-<5) + 451og(l - 2e) + J^(r - 5) 
=^M(r-<5) + ri(5). 

where 



(94) 



ri(5)=l[-51og(l + K{e)) + (1 + r)log(l 



d\og{l + T-5) + 5 + 451og(l - 2e)]. 



Since the inequality (|94]) holds for any 6 > 0, J^ir) is continuous in r, and ri(5) — )- as 5 — )- 0, we conclude 
that Ja/(0) < JIj{t). 



The proof for the case where r = is exactly the same as that for the case r > 0, except ( [791 ) is used in place 
of dSOb. We omit the details. 
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Appendix E 

Proof of Lemma[3J Lemma|4]and Lemma[5]Used in the Proof of Theorem[6] 

Proof of Lemma^ Applying Lemma [6] to the distribution q* G Q„ given in (22i and pT] ) gives Eg. [•S'n] 
^^[5"^] + + It follows from Chebyshev's inequality that for t„ > Ep[S'^] + ^^(e), 



p*irh^(y"-\ — M < Var g. [S^] 

'''*"'^''-"-(r,.-E,Kl-Ss(.))=' 



3* r j,P r ryn 



Thus, in order for lim„_i>oo Py{0n(-^i) = 1} = 1 to hold, we must have 

K - Ep[5„n - < Varg.[5P](l + o(l)) = 2^(1 +^(5))(1 + o(l)). 

where the last equality follows from Lemma [8] This leads to the claim of Lemma [3] 
Proof of Lemma [?]• Consider the statistic 

^n=^n =^n- 2— M*^) + 0{—=). 



The conditional distribution of 5^ in the event A under p is the same as the distribution of S*^, under p', where 
the number of samples is n' = n — 

Lin^J and y is the uniform distribution over [m — 1] . It then follows from 

Lemma [6] and Lemma [8] that 



Ep^n Ml = Ep' =n - [ J + O - 

Varp[5P|^] = Varp,[5„P,] = 2^(1 + o(l)). 
It follows from Chebyshev's inequality. Lemma [6] and Lemma [8] that for large enough n, 

Pp{5P<Ep[5P] + ^/s(e) + 24=|^„} 



2 2 
— nf^^ Tl Tl 

Pp{S^ + 2—K{e) < n+ — /i(e)+2— + 0(— )!>!„} 
m m x/m \/m 



2 

— —71 Tl 

P,{S^ < E,[S^\A] - + 0{^)\An} 
m \/m 



2^(1 + 0(7^)) 



^ • y/ III.- ■ ^ / > 



Proof of Lemma |5]- A simple combinatorial argument gives 



Applying Stirling's formula and substituting pi = ^ leads to 



Pp{An} = exp{-i^^^^^^ log(m)(l + o(l))}(l + o(l)). (95) 



Since m = o( iog"^)2 ) and m = o(n^), we have 



log(mj = — ^—j= — o(21og(n)) = o( — ). 



m \/m m 



Substituting this into (95 1 leads to the claim of this lemma. 
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